To further study the impact of the different factor on criminality we try to exploit econometric regressions.
We start with a standard OLS for a model, where we consider total criminality per 1000 people, as the sum of rape, homicide, violent crime and aggravated assault, all in per 1000 terms. The OLS model is: \[ \begin{align*} Criminality \, per \, 1000 \, inhabitants&=\alpha+\beta_1log(GDP)+\beta_2mh\_exp\_pc+\beta_3perc\_bscholder\_25\_44+\\ & \,\,+\beta_4 White+\beta_5 BlackAfricanAmerican+\beta_6Asian + \\ & \,\, + \beta_7Age\_0\_17 + \beta_8Age\_18\_24+ \beta_9Age\_25\_44+ \\ & \,\,+\beta_{10}Age\_45\_64+ \beta_{11}Age\_65\_84+\beta_{12}log(population) \end{align*} \] Running the regression we obtain the following coefficients’ estimates:
| total_criminality | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | -146.90 | -254.92 – -38.88 | 0.008 |
|
Current_dollar_GDP_millions [log] |
4.73 | 3.52 – 5.94 | <0.001 |
| mh_exp_pc [log] | -0.30 | -0.73 – 0.12 | 0.162 |
| perc_bscholder_25_44 | -0.09 | -0.15 – -0.03 | 0.002 |
| White | -30.76 | -36.96 – -24.57 | <0.001 |
| BlackAfricanAmerican | -19.52 | -25.57 – -13.48 | <0.001 |
| Asian | -58.17 | -69.43 – -46.90 | <0.001 |
| Age_0_17 | 158.71 | 50.56 – 266.85 | 0.004 |
| Age_18_24 | 197.17 | 78.31 – 316.04 | 0.001 |
| Age_25_44 | 241.96 | 142.86 – 341.06 | <0.001 |
| Age_45_64 | 161.16 | 55.17 – 267.14 | 0.003 |
| Age_65_84 | 255.53 | 128.73 – 382.32 | <0.001 |
| population [log] | -4.20 | -5.41 – -2.98 | <0.001 |
| Observations | 505 | ||
| R2 / R2 adjusted | 0.648 / 0.639 | ||
We notice that GDP, mh_exp_pc and education’s proxy have coefficients which we could have expected by the EDA we have done previously. Indeed, GDP increases criminality while mental health expenditure and education seems to decrease it. Although, among them only \(log(GDP)\) and education are statistically significant. Surprisingly, all races have a negative effect on criminality; this doesn’t sound a convincing result since the correlation of criminality with black-african american seemed positive in the corrplot in the EDA section. By looking at the table, we see that all groups of age in percentage of the population are significant. Although, being all coefficients positive, we think there could be some mi-specification leading to biased estimators. In general, we don’t think this regression can be informative for us, since we are not considering characteristics specific to the country and the year. Indeed, using a standard OLS we ignore the fact that our data-set is a panel data.
Therefore, we tried to identify our data-frame as a panel data and to compute regression with fixed effect, random effect and first difference. Before proceeding we will explain briefly each of them:
We try to run all regression, but after some consideration we think the most appropriate for our case is fixed effect method and the reasons are:
An additional consideration we do is whether to use or not clustered standard errors. The advantage of using them would be to account for within-cluster correlation or heteroskedasticity which the fixed-effects estimator alone does not take into account. Notice that cluster-adjusted standard error take into account standard error but leave your point estimates unchanged. The results are not changing in a relevant way considering clustered-adjusted standard errors or not, though.
We would like to point out also another thought we had while running regressions. In the EDA part we have seen how Rape seems to be the only kind of crime, among the one we are considering, to behave and to be influenced differently by GDP and slightly also by the other variables. For this reason we tried to run different regressions, with as dependent variable (in per 1000 term):
In all the regressions we don’t consider Unites States since would be redundant, being a total of the other states.
We report here the results which are worth mentioning in our opinion. As said above, we select the fixed effect method. The model is: \[ \begin{align*} Y_{i,t} &=\alpha+\beta_1log(GDP)+\beta_2mh\_exp\_pc+\beta_3perc\_bscholder\_25\_44+\\ & \,\,+\beta_4 White+\beta_5 BlackAfricanAmerican+\beta_6Asian + \\ & \,\, + \beta_7Age\_0\_17 + \beta_8Age\_18\_24+ \beta_9Age\_25\_44+ \\ & \,\,+\beta_{10}Age\_45\_64+ \beta_{11}Age\_65\_84+\beta_{12}log(population) \end{align*} \] \(Y_{i,t}\) refers to the dependent variable for state \(i\) at time \(t\). The estimation is done considering \(Y_{i,t}-\bar{Y_i}\), where \(\bar{Y_i}\) is the mean dependent variable for the state \(i\). indeed \(\alpha\) will not appear in the results, as it is constant overtime.
For total criminality regression’s results are:
| total_criminality | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
|
Current_dollar_GDP_millions [log] |
3.88 | 2.74 – 5.01 | <0.001 |
| mh_exp_pc [log] | 0.16 | -0.20 – 0.51 | 0.392 |
| perc_bscholder_25_44 | -0.05 | -0.13 – 0.02 | 0.149 |
| White | -33.85 | -81.35 – 13.64 | 0.163 |
| BlackAfricanAmerican | -3.33 | -53.89 – 47.23 | 0.897 |
| Asian | -75.27 | -131.71 – -18.83 | 0.009 |
| Age_0_17 | 268.59 | 87.68 – 449.50 | 0.004 |
| Age_18_24 | 242.92 | 42.55 – 443.30 | 0.018 |
| Age_25_44 | 260.26 | 69.73 – 450.78 | 0.008 |
| Age_45_64 | 267.36 | 70.77 – 463.94 | 0.008 |
| Age_65_84 | 246.68 | 50.23 – 443.12 | 0.014 |
| population [log] | -15.85 | -20.10 – -11.59 | <0.001 |
| Observations | 505 | ||
| R2 / R2 adjusted | 0.471 / 0.397 | ||
We can notice that the \(R^2\), which is a statistical measure representing the proportion of the variance for a dependent variable that’s explained by independent variables in a regression model, is lower here with respect to the standard OLS. With respect to the standard OLS estimations, magnitudes changes but not of sign. The only exception is mental health expenditure which, here, appears having a positive effect on criminality. Although, mh_exp_pc and education’s proxy are not statistically significant anymore. Additionally, among races, only the percentage of asian in the population seems statistically significant and still negative influencing criminality. As in the OLS estimates, \(log(population)\) decreases criminality: as population increases by 1%, criminality decreases by 16 crimes per 1000 inhabitants circa.
Among the various regressions we run, only the ones with rape and homicide as dependent variables have different results from the one just presented above.
For Rape:
| rape_legacy | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
|
Current_dollar_GDP_millions [log] |
0.0762 | 0.0087 – 0.1438 | 0.027 |
| mh_exp_pc [log] | 0.0131 | -0.0081 – 0.0343 | 0.226 |
| perc_bscholder_25_44 | -0.0001 | -0.0044 – 0.0043 | 0.976 |
| White | 1.2084 | -1.6239 – 4.0407 | 0.403 |
| BlackAfricanAmerican | 1.5597 | -1.4553 – 4.5746 | 0.311 |
| Asian | -0.2739 | -3.6396 – 3.0918 | 0.873 |
| Age_0_17 | 2.8333 | -7.9549 – 13.6216 | 0.607 |
| Age_18_24 | 3.3037 | -8.6452 – 15.2527 | 0.588 |
| Age_25_44 | 5.9642 | -5.3975 – 17.3260 | 0.304 |
| Age_45_64 | 4.9356 | -6.7876 – 16.6588 | 0.410 |
| Age_65_84 | 3.1876 | -8.5273 – 14.9025 | 0.594 |
| population [log] | -0.3180 | -0.5715 – -0.0644 | 0.014 |
| Observations | 505 | ||
| R2 / R2 adjusted | 0.216 / 0.106 | ||
From the FE regression with Rape per 1000 inhabitants as dependent variable we learn that:
For Homicides:
| homicide | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
|
Current_dollar_GDP_millions [log] |
0.051 | 0.037 – 0.065 | <0.001 |
| mh_exp_pc [log] | 0.004 | -0.001 – 0.008 | 0.084 |
| perc_bscholder_25_44 | -0.001 | -0.002 – -0.000 | 0.004 |
| White | -0.061 | -0.661 – 0.539 | 0.842 |
| BlackAfricanAmerican | 1.212 | 0.574 – 1.851 | <0.001 |
| Asian | -0.820 | -1.533 – -0.107 | 0.025 |
| Age_0_17 | 2.520 | 0.235 – 4.805 | 0.031 |
| Age_18_24 | 1.483 | -1.048 – 4.014 | 0.251 |
| Age_25_44 | 0.931 | -1.475 – 3.338 | 0.448 |
| Age_45_64 | 1.394 | -1.088 – 3.877 | 0.272 |
| Age_65_84 | 2.087 | -0.394 – 4.568 | 0.100 |
| population [log] | -0.209 | -0.262 – -0.155 | <0.001 |
| Observations | 505 | ||
| R2 / R2 adjusted | 0.683 / 0.638 | ||
From the FE regression with Homicides per 1000 inhabitants as dependent variable we learn that:
The answer is inconclusive. Our study and analysis reports slightly positive correlations with crimes if we look at the Corrplot’s Figure (only exception is with Rape), but from the regression it doesn’t result statistically significant. Although, the relationship between mental health expenditure and crimes appears negative from the scatterplot and the time series we have seen in some section above.
For GDP we can say that:
For Education we can say that:
Population’s age among different states and regions does not vary significantly, therefore, through our study we can’t say much. The only thing we can extrapolate from our project regarding age-distribution comes from the corrplot. A younger population (18-44) leads to higher homicides, aggravated assaults and violent crimes. Meanwhile, older population (45+) appears negatively related with crimes. But, regressions’ output are inconclusive since estimates are all positive and with great magnitudes.
Population’s race composition could play a role. Indeed, we see that South region in US has the highest percentage of Black-African American and the highest incidence of crimes, supporting the positive correlation found on the corrplot between all kinds of crimes and Black-African American. White population is positively correlated with rape. Although from the regressions we observe that the coefficients for all races are negative when looking at total criminality. For homicides, the significant estimates for race are for black african american (1 percentage point increase in black-african american population leads to 1 homicide more in 1000 inhabitants) and asian (1 percentage point increase in asiatic population leads to 0.8 homicide less in 1000 inhabitants).
By looking at correlations and the time series reported in previous section, we would answer yes. It exists a positive relationship between the two variables, thus, the more educated the population, the higher the expenditure on mental health in the state. We can represent this findings also in the following scatterplot with the linear regression.